<!DOCTYPE html>
<html lang="en">
<head>
	<meta charset="UTF-8">
	<meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">
	<title>Anatomy of an analyzer | ElasticSearch 7.7 权威指南中文版</title>
	<meta name="keywords" content="ElasticSearch 权威指南中文版, elasticsearch 7, es7, 实时数据分析，实时数据检索" />
    <meta name="description" content="ElasticSearch 权威指南中文版, elasticsearch 7, es7, 实时数据分析，实时数据检索" />
    <!-- Give IE8 a fighting chance -->
    <!--[if lt IE 9]>
    <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
    <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
    <![endif]-->
	<link rel="stylesheet" type="text/css" href="../static/styles.css" />
	<script>
	var _link = 'analyzer-anatomy.html';
    </script>
</head>
<body>
<div class="main-container">
    <section id="content">
        <div class="content-wrapper">
            <section id="guide" lang="zh_cn">
                <div class="container">
                    <div class="row">
                        <div class="col-xs-12 col-sm-8 col-md-8 guide-section">
                            <div style="color:gray; word-break: break-all; font-size:12px;">原英文版地址: <a href="https://www.elastic.co/guide/en/elasticsearch/reference/7.7/analyzer-anatomy.html" rel="nofollow" target="_blank">https://www.elastic.co/guide/en/elasticsearch/reference/7.7/analyzer-anatomy.html</a>, 原文档版权归 www.elastic.co 所有<br/>本地英文版地址: <a href="../en/analyzer-anatomy.html" rel="nofollow" target="_blank">../en/analyzer-anatomy.html</a></div>
                        <!-- start body -->
                  <div class="page_header">
<strong>重要</strong>: 此版本不会发布额外的bug修复或文档更新。最新信息请参考 <a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html" rel="nofollow">当前版本文档</a>。
</div>
<div id="content">
<div class="breadcrumbs">
<span class="breadcrumb-link"><a href="index.html">Elasticsearch Guide [7.7]</a></span>
»
<span class="breadcrumb-link"><a href="analysis.html">Text analysis</a></span>
»
<span class="breadcrumb-link"><a href="analysis-concepts.html">Text analysis concepts</a></span>
»
<span class="breadcrumb-node">Anatomy of an analyzer</span>
</div>
<div class="navheader">
<span class="prev">
<a href="analysis-concepts.html">« Text analysis concepts</a>
</span>
<span class="next">
<a href="analysis-index-search-time.html">Index and search analysis »</a>
</span>
</div>
<div class="section">
<div class="titlepage"><div><div>
<h2 class="title">
<a id="analyzer-anatomy"></a>Anatomy of an analyzer<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/analysis/anatomy.asciidoc">edit</a>
</h2>
</div></div></div>
<p>An <em>analyzer</em>  — whether built-in or custom — is just a package which
contains three lower-level building blocks: <em>character filters</em>,
<em>tokenizers</em>, and <em>token filters</em>.</p>
<p>The built-in <a class="xref" href="analysis-analyzers.html" title="Built-in analyzer reference">analyzers</a> pre-package these building
blocks into analyzers suitable for different languages and types of text.
Elasticsearch also exposes the individual building blocks so that they can be
combined to define new <a class="xref" href="analysis-custom-analyzer.html" title="Create a custom analyzer"><code class="literal">custom</code></a> analyzers.</p>
<div class="section">
<div class="titlepage"><div><div>
<h3 class="title">
<a id="analyzer-anatomy-character-filters"></a>Character filters<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/analysis/anatomy.asciidoc">edit</a>
</h3>
</div></div></div>
<p>A <em>character filter</em> receives the original text as a stream of characters and
can transform the stream by adding, removing, or changing characters.  For
instance, a character filter could be used to convert Hindu-Arabic numerals
(٠‎١٢٣٤٥٦٧٨‎٩‎) into their Arabic-Latin equivalents (0123456789), or to strip HTML
elements like <code class="literal">&lt;b&gt;</code> from the stream.</p>
<p>An analyzer may have <span class="strong strong"><strong>zero or more</strong></span> <a class="xref" href="analysis-charfilters.html" title="Character filters reference">character filters</a>,
which are applied in order.</p>
</div>

<div class="section">
<div class="titlepage"><div><div>
<h3 class="title">
<a id="analyzer-anatomy-tokenizer"></a>Tokenizer<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/analysis/anatomy.asciidoc">edit</a>
</h3>
</div></div></div>
<p>A <em>tokenizer</em>  receives a stream of characters, breaks it up into individual
<em>tokens</em> (usually individual words), and outputs a stream of <em>tokens</em>. For
instance, a <a class="xref" href="analysis-whitespace-tokenizer.html" title="Whitespace Tokenizer"><code class="literal">whitespace</code></a> tokenizer breaks
text into tokens whenever it sees any whitespace.  It would convert the text
<code class="literal">"Quick brown fox!"</code> into the terms <code class="literal">[Quick, brown, fox!]</code>.</p>
<p>The tokenizer is also responsible for recording the order or <em>position</em> of
each term and the start and end <em>character offsets</em> of the original word which
the term represents.</p>
<p>An analyzer must have <span class="strong strong"><strong>exactly one</strong></span> <a class="xref" href="analysis-tokenizers.html" title="Tokenizer reference">tokenizer</a>.</p>
</div>

<div class="section">
<div class="titlepage"><div><div>
<h3 class="title">
<a id="analyzer-anatomy-token-filters"></a>Token filters<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/analysis/anatomy.asciidoc">edit</a>
</h3>
</div></div></div>
<p>A <em>token filter</em> receives the token stream and may add, remove, or change
tokens.  For example, a <a class="xref" href="analysis-lowercase-tokenfilter.html" title="Lowercase token filter"><code class="literal">lowercase</code></a> token
filter converts all tokens to lowercase, a
<a class="xref" href="analysis-stop-tokenfilter.html" title="Stop token filter"><code class="literal">stop</code></a> token filter removes common words
(<em>stop words</em>) like <code class="literal">the</code> from the token stream, and a
<a class="xref" href="analysis-synonym-tokenfilter.html" title="Synonym token filter"><code class="literal">synonym</code></a> token filter introduces synonyms
into the token stream.</p>
<p>Token filters are not allowed to change the position or character offsets of
each token.</p>
<p>An analyzer may have <span class="strong strong"><strong>zero or more</strong></span> <a class="xref" href="analysis-tokenfilters.html" title="Token filter reference">token filters</a>,
which are applied in order.</p>
</div>

</div>
<div class="navfooter">
<span class="prev">
<a href="analysis-concepts.html">« Text analysis concepts</a>
</span>
<span class="next">
<a href="analysis-index-search-time.html">Index and search analysis »</a>
</span>
</div>
</div>

                  <!-- end body -->
                        </div>
                        <div class="col-xs-12 col-sm-4 col-md-4" id="right_col">
                        
                        </div>
                    </div>
                </div>
            </section>
        </div>
    </section>
</div>
<script src="../static/cn.js"></script>
</body>
</html>